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The technical ob|ective of this research was to develop a computer method for 
arranging a number of individual task patterns, representing |ob incumbents in a given 
occupational area, into groups or clusters. This advanced computerized technique for 
clustering work tasks produces homogeneous clusters of task patterns using an input 
of tasks performed in a sample of |obs. These clusters represent the occupational 
specialties that exist in a field of work. The important features of this technique are ; 

(1) its capacity for computer analysis of task patterns of large numbers of subjects, 

(2) its capability for computer assistance in making research decisions at various 
levels of task analysis, and (3) its flexibility as a tool of pattern recognition and 
structuring. With only minor modification, the computer programs and concepts 
described in this report should be of interest to those concerned with other 
clustering, classifying, and taxonomic techniques. (CH) 
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BRIEF 



This report describes an advanced computerized technique 
for clustering work tasks which was developed in the course 
of research being conducted by this Activity. The objective 
this research is to devise a method for determining the 
basic technical skills needed to man current and future 
weapons and support systems in order to provide a basis for 
the Navy enlisted personnel classification structure required 
in the next decade. Progress and results pertaining to this 
broad research objective appear in another report series is- 
sued by this Activity. 

The primary purpose of thio Technical Bulletin is to 
provide research and staff organizations involved in task 
^^ a lysis with a description of a new method for grouping task 
patterns. With an input of tasks performed in a sample of 
jobs, this computerized technique produces a series of rela- 
tively homogeneous clusters of task patterns. These clusters 
represent the occupational specialties that exist in a field 
of work. 

The most important features of this technique are: (l) 

its capacity for computer analysis of task patterns of large 
numbers of subjects ; (2) its capability for computer assist- 
ance in making research decisions at various levels of task 
analysis; and (3) its flexibility as a tool of pattern recog- 
nition and structuring. 

In addition, this report should be of interest to those 
concerned with other clustering, classifying, and taxonomic 
techniques. The same basic problem of clustering phenomena 
by some criterion of similarity is encountered by physicists, 
mathematicians, computer designers, bio-medical engineers, 
information theorists, and others. With only minor modifi- 
cations, the computer programs and concepts described in this 
report could be of value in these other fields. 
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A COMPUTER TECHNIQUE FOR CLUSTERING TASKS 



I . INTRODUCTION 



A. Background 

The purpose of this research is to develop a method for determining 
the basic technical skills and their levels required for current opera- 
tional weapons and support systems and for future weapons and support 
systems which will be introduced into operational use in the Navy during 
the next decade. Initial emphasis is being placed upon current skill 
requirements. Subsequent phases will deal with skill requirements gen- 
erated by future technological developments. 

The ultimate application of the method developed in this research 
will be the determination and description of work requirements so as to 
ensure their placement in the enlisted personnel classification struc- 
ture in a meaningful and systematic manner. The achievement of this 
objective will permit the removal, replacement, and rearrangement of 
work requirements as this becomes necessary due to obsolescence of 
certain types of work, changes in others, and the addition of new work 
requirements associated with technological change. 

As the initial step in this method development phase, a pilot study 
is being conducted of the engineering department in destroyers in order 
to determine the feasibility of the research approach and the efficacy 
of the associated research instruments. A report on this research was 
published in May 1965 ( ll ) in which the overall concepts and research 
design as well as the progress to date in the pilot study are described. 
Readers interested in a more complete understanding of the framework of 
this research should consult that report. 

B. Research Framework 

The central concept of the research methodology in this study is 
that the performance of a given task or group of tasks is a function of 
the technical, organizational, and communicational dimensions of the 
work situation. Accordingly, major emphasis has been placed on the 
elaboration of work requirements in terms of a number of variables 
which are descriptive of each of these three dimensions. It is hypoth- 
esized that different occupational specialties will exhibit character- 
istic technical, organizational, and communicational patterns. The 
acronym, SAMOA, Systematic Approach to Multidimensional Occupational 
Analysis, has been adopted to label this approach. 

The application of a multidimensional approach to occupational 
analysis involves, among other things, the analysis of task patterns 
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| in terms of the technical, organizational, and communicational dimen- 

j sions of the work situation. The term "task pattern" is defined as the 

| total alignment of different tasks performed by a given individual or 

| set of individuals in a work situation. 

$ 

| 

| Before any analysis of the variables associated with these dimen- 

| sions can begin, a basis for this analysis must be provided. Thus, in 

j characterizing work requirements, it is first necessary to designate 

| the substance and form of the work. In this research, work requirements 

I initially take the form of a series of homogeneous and related tasks. 



It is the purpose of this report to describe a technique, developed 
in this research, which can be used to group individual task patterns on 
the basis of their similarity. These groups or "clusters" of task pat- 
terns, when amplified by other variables, will help to provide the 
framework of work requirements necessary for a personnel classification 
structure . 
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C. Occupational Research and Task Analysis 

The problem of grouping tasks and jobs for occupational classifi- 
cation has been approached from a variety of directions. It is the 
purpose of a particular method of analysis that serves as the prime 
criterion in choosing among alternative techniques. For instance, the 
manifold approaches to "job evaluation" (4_,£) are all ultimately con- 
cerned with the assessment of jobs to determine their relative worth in 
establishing a balanced wage structure. For demographic purposes, the 
Bureau of the Census has approached classification in terms of broad 
occupational categories designed for general use (8). The Dictionary 
of Occupational Titles (ljO provides another approach to occupational 
classification, employing categories which differentiate on the basis 
of skill level, subject matter/industry , and process/activity . This 
structure, like the Bureau of the Census classification, is designed 
for nation-wide aiDplication — particularly by guidance counselors. 

These methods of grouping occupations have certain characteristics 
that detract from their use in this research study. Specifically, a 
common feature concerns the job as a basic unit of analysis, not the 
task . In classifying work at the job level, certain assumptions are 
made concerning the arrangement of work. In particular, it is assumed 
that "jobs" exist in the conventional sense, and not a series of task 
patterns that adhere to different positions depending on the specific 
work situation. Moreover, such approaches frequently assume that the 
task patterns associated with certain job titles are relatively con- 
stant and, therefore, the job can be used at the finest level of 
analysis . 

Whatever their virtues, job level analyses are inappropriate in the 
context of naval occupational classification because of the unique work 
situation aboard ships. In order to effectively classify technical 
skills in naval occupations, it is necessary to approach the problem in 
terms of task analysis. 










As with occupational analysis, the variety of techniques available 
for grouping tasks is considerable. Nevertheless, the purposes of this 
research impose a number of constraints or requirements on the kind of 
technique that can be used. First, tasks can be grouped by various 
criteria independent of their technical pattern of performance. For 
example, in a previous report on this research project (lO), tasks were 
classified in terms of their technical complexity. Tasks have also 
been classified by their behavioral content (6_) , as stimulus-response 
events (2), in terms of learning demands (5.), as man-machine elements 
(12_), and in combinations of the above (3.) . In this research, tasks 
are associated by their pattern of technical performance and these pat- 
terns are grouped by their similarity of task content. Thus, the re- 
quirement for grouping technical task patterns as performed on board 
naval vessels results in constraints on the analytical techniques that 
can be employed. 



A second constraint is concerned with the requirement to analyze 
large numbers of task patterns simultaneously. Conventional "job 
analysis” employs intensive, direct methods of obtaining occupational 
information. Because of the expense involved in personal contact with 
the job over extended periods of time, and because of the limitations 
in occupational coverage possible, survey methods of obtaining task 
pattern information are preferable for large-scale task analysis (_9) . 



There are other considerations in this research that encouraged 
the development of a new approach to task pattern analysis . The feasi- 
bility of large-scale occupational analysis on a Navy-wide basis is 
dependent, in large part, upon the analytical speed and operational 
simplicity of the techniques to be employed. Thus , considerations of 
practicability encouraged the use of computer techniques. Also, it was 
advisable to minimize the amount of analytical bias introduced by con- 
temporary occupational groupings in the Navy or other existing classi- 
fication structures . It was similarly desirable to minimize the number 
of judgmental and inferential decisions that would have to be made in 
grouping combinations of tasks * 



Because of the constraints imposed by the purpose of this research, 
it appeared feasible and desirable to develop a method which would be 
quantitative in approach and computerized in process . This would maxi- 
mize the criteria, of analytical speed and occupational scope, and pro- 
vide for computer-assisted research procedures as well. 






II. COMPUTER CLUSTERING TECHNIQUE 



The decision to employ computerized methods of determining the 
"natural" task groupings in an occupational area led to a search of 
the research literature for possible techniques. Unfortunately, most 
of the existing methods are not easily adaptable to a wide range of 
research problems. Again, the purpose' of the research dictates the 
limitations of the methods employed. Also, computer "soft— ware" 
technology has not advanced to the point that complex programs, de- 
signed for particular research objectives and written in a particular 
language for a specific computer, can be adapted with facility to 
other research purposes and other computers. 

For these reasons , a new computer clustering technique was devel- 
oped in response to the specific research problem involved in this 
study. Although many of its features are unique, there are seme 
points which coincide with existing methods of analysis. A selected 
bibliography of some of these approaches to "clustering," "pattern 
recognition," "profile analysis," "factor analysis," and other group- 
ing procedures, are contained in the last section of this report. 

A. T echnical Objective 

The technical objective of the initial phase of this research was 
to develop a computer method for arranging a number of individual task 
patterns, representing job incumbents in a given occupational area, 
into groups or "clusters." A "cluster" is defined as a group of re- 
spondents characterized by relatively small differences in the kinds 
of tasks performed. In pursuing this approach, an iterative computer 
clustering technique was devised to group similar task patterns into 
homogeneous occupational segments or clusters. This technique encom- 
passes a series of computer programs that facilitate the process of 
grouping task patterns and provide a variety of outputs designed to 
carefully regulate and control the entire procedure at any step in the 
process. The data collection procedures used to obtain input data, 
and the data processing procedures employed to obtain clusters of task 
patterns, are set forth in the following sections. 

B. Data Collection 



A number of data collection instruments were devised to obtain in- 
formation on the variables associated with the three dimensions of work 
requirements being studied in this research. These included super- 
visors' questionnaires, work contact questionnaires, task lists, and 
others, but only the task lists are of concern for purposes of the 
present report. 

In developing the Task List Questionnaires , a comprehensive list of 
tasks performed by engineering department personnel was first developed. 
In its final form, this list consisted of over 500 separate items. This 



list was then divided into three broad work areas in engineering that 
appeared to be fairly discrete in terms of the work performed and the 
equipments involved. These are: (l) the Propuls ion /Auxiliary area, 

encompassing work generally performed by personnel in the occupational 
fields of Boilerman (BT) , Boilermaker (BR) , Machinist’s Mate (MM), and 
Engineman (EN); (2) the Hull/Repair area, including the work of the 
Damage Controlman (DC) , Shipfitter (SF) , and Machinery Repairman (MR) ; 
and (3) the Electrical area, covering the tasks performed by Electri- 
cian’s Mates (EM) and Interior Communications Electricians (IC). 

Within each major area, the task list is divided into subheadings which 
indicate the main categories of equipment operated and maintained in 
that area. The instruction page and one sample page of the Task List 
Questionnaire, as administered to personnel in the Hull/Repair work 
area, are contained in Appendix A. 

These task lists were administered to about 400 engineering depart- 
ment personnel in a sample of six destroyers in the San Diego-Long Beach 
area. This represents 7 6 % of all personnel in these departments. Each 
man completed only that task list which pertained to his area of work. 

P-^or to computer processing, the task patterns of respondents, as 
checked on the source document (Task List Questionnaire), were key 
punched on cards. These cards, indicating the tasks performed by each 
individual, comprise the computer input. 

C. Initial Computer Processing Procedures 

The problem faced at this point was how to group the task patterns 
of respondents so that clusters of similar tasks and task patterns 
would emerge in a form suitable for use in determining the technical 
work requirements of a given occupational area. 

This was accomplished in a number of steps. First, an index was 
developed to indicate the similarity of each individual’s task pattern 
with that of every other individual in the sample. Second, various 
respondents were selected as "pivot men" on the basis of their task 
pattern variance, and other individuals were clustered around the pivot 
men by task pattern similarity. Third, the resulting clusters were 
analyzed by use of other computer routines in order to develop "opti- 
mum specialty clusters." Fourth, analytical procedures and computer 
programs were revised on the basis of the preceding analysis to refine 
both the technique and the data. This technique represents an inter- 
play of mathematics, computer analysis, and human judgment. The steps 
employed in this procedure are described in more detail in the follow- 
ing sections. 



Similarity Index 



Prior to the actual clustering process, each individual's pattern 
of tasks was compared to the task pattern of every other individual who 
completed the same task area questionnaire. An index of similarity* 
was then computed for each pair of individuals based on the relative 
similarity of the tasks they performed. 

This index is provided by: 



The denominator in this expression represents the total number of dif- 
ferent tasks performed by i and j combined. 

This formula generates a continu!yum ranging from "0,” indicating 
total independence (i.e., no tasks in common between man i and man j), 
to M l," indicating complete identity (i.e., all tasks performed by i 
are identical to those performed by j) . 



*This index is referred to conceptually in another form as a 
"Coefficient of Compositional Similarity" (CCS), in which 



where Id = number of tasks identical between Man 1 and Man 2 



The CCS is an inversion of a formula originally termed the "Coefficient 
of Compositional Uniqueness . " It was used to determine overlapping 
patterns of acquaintances among neighbors in a study by Carr (l) , which 
partially replicated previous research performed by Sweetser (13) . 



S(i,j) = 



n{T(i ) } 



n{T(i ) } + n{T( j ) } - n{T(i,j)} 



where 



n{T(i,j)} is the number of tasks performed 
by both man i and man j 



n{T(i)} is the number of tasks performed 
by man i 



n{T(j)} is the number of tasks performed 
by man j 



CCS 



Id 



Id + Uni + Un2 



Uni = number of tasks unique to Man 1 



Un 2 = number of tasks unique to Man 2 
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For example, consider the following comparison of task patterns in 
which T(i) contains 10 elements or tasks and T(j) contains 15 elements: 

T(i) = {A03,Al4,A15,A19,B05,B17,C21,D04,E09,E10} 

T( j ) = {AO 3 »A12 ,AlU ,A17 ,A19 ,B01 ,B02 ,B03 ,B17 , Cl 5 ,E10 ,E17 ,T01 ,T02 ,T03> 

T ( i , J ) = {A03,AlU,A19»B17,E10} 

Note that man i has performed 10 tasks (indicated by the alpha— numeric 
codes), 5 of which are common with man j — who lists 15 tasks performed. 
Applying the formula,* we have: 



~ 10+15-5 = 20 = ^ or ^/^ths) 

It was desirable to convert the quotient into 6^4-ths because of computer 
processing requirements , although this is of no consequence in any sub- 
sequent stage . 

This formula was applied to every possible pair of respondents in 
each task area and a matrix of mutual similarities was then generated 
by the computer. t The size of the matrix is determined by the number 
of personnel associated with each of the three task lists. Thus, the 
Propulsion/Auxiliary Task List Questionnaire, which was administered 
to 278 personnel, generated a semi -matrix with m(m-l)/2 or 38,503 dis- 
tinct similarities, where m equals the number of personnel. The Hull/ 
Repair list produced a semi-matrix of 7*H (i.e., 39 ( 38 ) /2 ) indices and 
the Electrical list resulted in 2,775 (i.e., 75(7*0/2). 



#In set notation: S(i,j) 



n{T(i )OT(.i ) } 
n{T(i ) UT( j ) } 
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where 



T(i) is the set of tasks performed by man i; 
similarly for T(j) 



n{T(i)OT(j)} represents the number of tasks 

in the "intersection" of the 
task lists (patterns) 

n{T(i) UT(j ) } represents the number of tasks 

in the "union" of the task lists — 
that is, the set of tasks that 
belong to either or both lists 

*This Index also provides the basic data for other indices used in 
development of the clustering technique; for instance , the Cluster Veri- 
fication Score (CVS), Vector Verification Score (WS), and Cluster 
Distance Score (CDS). 
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The similarity matrix comes in the form of a listing in which each 
individual is listed in serial order by identification code and all 
other personnel are compared with that individual by an index of simi- 
larity. For reference purposes, these data were converted to a 
computer— produced semi— matrix. Aside from the similarity listing and 
semi— matrix, the similarity indices are recorded in another form— that 
of a frequency distribution. For each of the three task lists, a dis- 
tribution of indices was printed out in an 8 x 8 table. Examples of 
the initial listing, the semi-matrix, and the frequency distribution 
are contained in Appendix B. 



Pivot Selection 



In order to group the tasks performed by personnel in this sample, 
a starting point was necessary. In the initial computer clustering 
technique, this point is provided by a "pivot man" — or simply, "pivot." 
The pivot is the reference point for the entry of other personnel into 
, The selection of pivots is controlled by the variance of 
each individual's similarity indices, where the variance is computed 
by: 

n lx 2 - (lx) 2 £(X-X) 2 



2 _ 



s'- = 



n(n-l) 



n-1 



where 



X = similarity index of man i with man j , 
or S ( i , j ) 



n = number of similarity indices of man i 
with all j 



X = mean of similarity indices of man i 



One of the outputs of this phase of data processing is a variance 
listing for each task list, as shown in Appendix B. 



After the calculation of each variance, the individual with the 
highest variance is selected as the first pivot and becomes the refer- 
ence point or core of the first cluster of task patterns. The ration- 
ale for this procedure is as follows. One of the requirements of 
clustering tasks is that the clusters be sizable, but also separate 
and distinct. A large variance indicates the presence of highly 
similar and highly dissimilar task patterns in a given individual's 
range of similarities — the maximum variance occurring where a man has 
one— half of his similarities = 0, and one— half = 1. 










High variance is employed as the criterion for pivot selection for 
two reasons: first, a pivot candidate’s high variance indicates that 

his task pattern is very similar to those of some individuals, which 
assures that a relatively homogeneous cluster can be formed. Second, 
high variance also means that the pivot candidate’s pattern of tasks 
differs greatly from those of other personnel, thus enabling the initial 
cluster to be distinct from at least a portion of the body of remaining 
tasks. As a result, succeeding clusters can be formed around pivots 
that are distinct from previous clusters. 



A simplified example of the relationship between an individual’s 
range of similarities and his variance is shown in Table 1. 



TABLE 1 

Calculation of Variance for Two Task Pattern Samples 



Man 


Similarity 

Index 

X 


x-x 


(x-x ) 2 


S 2 


i 


03 


-17 


289 






20 


0 


0 






37 


17 


289 




[Mean S(i,j) = 60/3 = 20] 


60 




578/2 


= 289 


J 


15 


- 5 


25 




- 


20 


0 


0 






2£ 


5 


25. 




[Mean S(i,j) = 60/3 = 20] 


6o 




50/2 


= 25 



This example shows the case of two personnel, each with a mean similar- 
ity index of 20 and a list of similarity indices with three other per- 
sonnel. For man i, the similarities are both high and low (37 and 03, 
respectively), while for man j the similarities are grouped around the 



average (i.e., 15, 20, and 25). Using the deviation form. 



.2 = I(X-X) 2 , 
n-1 



the 



variance for man i is 289 while for man j , only 25* The two cases 
in this example are exaggerated to show the effect of variance in the 
selection of pivots, but the computer process is approximately the 
same. In terms of this computer program, man i is the better choice 
for pivot since highly similar task patterns (as represented by the 
S(i,j) of 37) can be clustered with him and still make provision for 
clustering other task patterns that are distinct (as represented by 
the S(i,j) of 03). 
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Cluster Grouping 



After the variance is computed for each individual, and the first 
pivot is selected (representing the greatest variance), the initial 
cluster is produced by selecting those individuals with a similarity 
to the pivot man above a certain threshold. A "similarity threshold" 
(ST) was set for each computer run in order to control the process of 
clustering task patterns. This threshold represents the minimum simi- 
larity acceptable for inclusion in a cluster and is regulated by a 
"control percentage" (CP) . By setting the ST at various values, the 
size and homogeneity of clusters can be regulated. 



As noted previously, a frequency distribution of similarity indices 
is derived from the similarity matrix and printed out in an 8 x 8 table, 
with each cell representing l/6Uth of the distribution. This listing 
was converted to a more conventional form for determining the similar- 
ity threshold to be used for each computer run. Table 2 shows the 
distribution of similarity indices for each of the three task areas. 



Using the similarity distribution for the Hull/Repair area (m=39) 
in Table 2 as an example, the procedure for determining the similarity 
threshold can be delineated. If, for instance, the control percentage 
was set at 10$, a frequency count of the 7^-1 similarities would begin 
at the bottom of the table and continue until 10$ or 7*+ similarities 
had been counted. Note that this count ends in the frequency class of 
35/61+ths. The ST is thus set at 35 s and the computer then generates 
a cluster of personnel whose similarity to the pivot is greater than the 
ST (i.e., >36), The resultant cluster listing contains the frequency 
distribution, the control percentage and ST, the identification code 
of the pivot, and the identification codes of all cluster members with 
their similarity indices to the pivot above the V *shold. A partial 
sample cluster listing for the Hull/Repair area is shown on page 13 • 



Once the first pivot is selected and the members of the first 
cluster are chosen from those personnel with similarities to the pivot 
> ST, the computer initiates the selection of the second cluster. 

This is accomplished by setting the variances of all members of the 
first cluster to zero so that they will be ineligible to become pivots 
in succeeding clusters. The second pivot is then selected as the high- 
est remaining variance, and a second cluster of similarities > ST is 
generated and printed out. As before, the variances of all personnel 
in the second cluster are set to zero and the third pivot is obtained 
by again selecting the pivot candidate with the highest variance. The 
procedure is reiterated and clusters are produced until a pivot candi- 
date cannot cluster at least one other individual with a similarity to 
the pivot higher than the threshold. 



11 









I' 









c4 



I 



TABLE 2 



Frequency Distribution of Task Pattern 
Similarities in Three Task Areas 



Similarity Index 


Propuls ion/Auxiliary 


Hull/Repair 


Electrical 


(64ths) 


f 


f 


f 


0 


2439 


4 


70 


1 


1928 


5 


51 


2 


2090 


10 


93 


3 


2084 


12 


85 


4 


2275 


11 


132 


5 


2116 


15 


152 


6 


2022 


22 


159 


7 


1976 


16 


137 


8 


2167 


19 


146 


9 


1856 


20 


118 


10 


il25 


21 


104 


11 


1700 


14 


89 


12 


1559 


19 


73 


13 


. 1404 


21 


74 


14 


1283 


24 


64 


15 


1028 


22 


53 


16 


1174 


33 


81 


17 


944 


18 


66 


18 


867 


14 


67 


19 


763 


22 


85 


20 


700 


19 


66 


21 


619 


12 


43 


22 


587 


22 


59 


23 


446 


14 


67 


24 


483 


20 


64 


25 


386 


19 


63 


26 


354 


28 


55 


27 


295 


16 


62 


28 


264 


27 


53 


29 


198 


19 


52 


30 


151 


21 


53 


31 


98 


19 


31 


32 


139 


29 


49 


33 


72 


24 


25 


34 


67 


20 


33 


35 * 


40 


21-* 


28 


36 


37 


11 


22 


37 


33 


15 


9 


38 


17 


7 


9 


39 


13 


11 


7 


4o 


6 


7 


1 

3 


41 


8 


5 


7 


42 


6 


1 


1 

3 


43 


2 


3 


3 


44 


5 


3 


2 


45 


2 


2 


1 


46 


0 


2 


1 


47 


1 


1 


1 


48 


0 


0 


1 


49 


0 


0 


1 


50 


0 


0 


0 


51 


0 


1 


0 


52 


1 


1 


1 


53 


0 


0 


0 


54 


0 


0 


1 


55 


1 


0 


0 


56 


1 


0 


0 


57 


0 


0 


1 


58 


1 


0 


0 


Total 


38503 


jki 


2775 
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Partial Cluster Listing 
(Hull/Repair Task Area) 



similarity threshold « 35 

similarity distribution 
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The iterative clustering program provides a number of separate but 
related products: (l) a similarity listing which contains an index of 

task pattern similarity between each man and every other man; (2) a 
variance listing which shows the variance of each individual's simi- 
larities; (3) a frequency distribution of similarities for each of 
three task lists; (^) a series of cluster listings, each showing a 
pivot man (the highest variance in the cluster) and all personnel with 
a similarity index high enough to qualify for that cluster; and ( 5 ) a 
task listing for each cluster, giving every task performed by personnel 
in that cluster and the number performing the task. The processing 
steps necessary to produce this output are shown in the form of a 
flowchart in Figure 1. There are several procedures that must be 
followed after production of the initial cluster runs. 

The process of cluster grouping is an experimental one; that is, a 
series of computer runs must be made at different similarity thresholds 
in order to determine which ST satisfies the criteria used to evaluate 
the clusters. In this research, 30 different computer runs were made 
in the three task areas. Since each cluster run usually differs in the 
number of clusters, the pattern of task association, the homogeneity of 
the clusters, and the identity and variance of every pivot man but the 
first, it is necessary to examine a series of experimental clusters in 
order to obtain an "optimum" cluster run. The latter results in what 
are termed "specialty clusters." 

Specialty clusters are characterized by (l) relatively low number 
of unclustered personnel; (2) high number of individual clusters; ( 3 ) 
avoidance of excessively large "initial" clusters or very small "trail- 
ing" clusters; low incidence of overlapping cluster membership; 

(5) high variance of pivot men, especially in the last third of a 
cluster run; (6) low variance of low similarity cluster members, in 
order not to lose qualified pivots; and (7) high homogeneity of in- 
dividual clusters. The analysis of computer program products is greatly 
facilitated by using these criteria for recognizing "optimality" in 
different cluster runs. However, in order to help evaluate the mass of 
output data produced by the computer, another program (termed "cluster 
identification") was required to assist in the comparison of cluster 
runs based on different similarity thresholds. 

The examination of clusters produced by the initial program con- 
sisted of a systematic evaluation of the different sets of clusters 
produced by the different thresholds. Although this analysis preceded 
later refinements in the computer programs, it is not essential to an 
understanding of the clustering techniques that were ultimately 
adopted. As a result, the details of the "cluster identification" 
output and its attendant analysis are contained in Appendix C. 
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D. Program Refinement 



Proceeding from a thorough analysis of the initial clustering pro- 
gram, a refined method of selecting pivot men and their respective 
clusters was developed. The revised procedure consists of two separate 
but related parts: the Pivot Optimization Program, which selects pivot 

men; and the Cluster Selection Program, which constructs the clusters 
around the pivots. 

These techniques were developed in response to an output problem 
created by the initial clustering program. In the normal operation of 
this program, "optimum" potential pivot men could be prevented from 
becoming pivots by their presence or membership in preceding clusters. 
Figure 2 illustrates the problem manifested in the initial cluster 
program. A cluster with an ST of 2k is shown, in which the pivot has 
a variance of 100. All personnel with a similarity to the pivot over 
2k are clustered, and their similarities with the pivot — as well as 
their variances — are also shown. Man X is included in the cluster be- 
cause his similarity to the pivot is >2k (i.e., 25). Nevertheless, 
his variance is quite high ( 98 ) — and he could better serve as the next 
cluster’s pivot than as a marginal member of his present cluster. 

Since the initial cluster program does not select pivots from among 
those previously clustered, man X cannot act as a pivot man. For this 
reason, a procedure was developed in which the selection of pivots is 
completed before clustering is initiated. 

Pivot Optimization 

In the refined program, the selection of pivots is optimized in 
two ways: first, they should have a high variance, for reasons noted 

previously ( supra , pp. 9>10)j second, they should have a relatively low 
similarity with each previously selected pivot man. The latter cri- 
terion enables each pivot to have a separate work area for his cluster. 
When two different pivots have a high similarity between them, the 
clusters that are formed around them are likely to be more similar 
than distinct. By selecting pivots who have a low similarity to pre- 
ceding pivots, it is possible to avoid much of the overlapping of 
functional content between clusters. 

The pivot selection process occurs separately for each of the three 
task area sub s ample s , and every respondent administered a task list is 
evaluated in terms of the two optimization criteria noted above. The 
function of the program is to order the men in terms of their desira- 
bility as pivots by evaluating both their individual variances as well 
as their similarity to previously selected pivots. The specifics of 
this procedure are described below. 




FIGURE 2 

Sample Cluster Configuration 
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Let p, q, and r be indices of three pivot men. The individual 
with the highest variance, V(i), is selected as the initial pivot 
man p (or Pi). For all i / p, compute: 






where 



S(i,p) = similarity of man i to pivot p 
V(i) = variance of man i 

Select min W(i;p), and designate that i = q (or P? ) • 



For all i 1 p,q, compute: 



i • n n I = maX r^) , ^4^-1 

W( 1 ,p,q) pjq v(i) y(i) J 



Select min W(i;p,q), and designate that i = r (or P 3 ). 



A generalized procedure for selecting all pivots (other than Pi) 
employs the following notation. For all i ^ {II}, compute: 



max S( i ,tt) 

n) [ V(i) J 



where 



W( i ;tt ) = ^ 

(H) = set of all pivot men 
TT^n = pivot (tt) is an element of set {n} 
Select min W(i;ir), and designate that i = tt. 



Appendix D contains a computer listing representing partial output 
of the pivot optimization program. For Pi, it lists his identification 
code (ID) and variance. For all succeeding pivots, it lists his ID and 
variance, his similarity with tt (the highest similarity with any tt in 
the set {II}), and his W(i;ir). To express the optimality of pivots, the 
size of W(i;ir) indices increases with each succeeding pivot man on the 
list. Preliminary decisions regarding the number of pivots to use for 
clustering (and, therefore, the number of clusters in a task area) were 
based on an analysis of these listings. For example, the partial 
Propuls ion/Auxiliary listing on page 19 shows a break in the pivot 
optimization values after the seventh man. Thus the first seven pivots 
were tentatively selected to form experimental clusters. The number of 
clusters employed could be changed by simply altering the number of 
pivots chosen from the list. A more refined technique for selecting 
the number of clusters, without reference to pivots, was subsequently 
developed and is ^ elaborated in a later section. Procedures for clus- 
tering around the pivots selected are contained in the following dis- 
cussion. 



18 



mm 








Partial Pivot Optimization Listing 
^ogkam-c. u«th» Mnpinr»T *0 ‘ (Propulsion/ Auxiliary Task Area) 

alpha .*nn 






!!l!!f!II!I!!!!Ii!!!iSl!iil!li!!ii!ii!!iiiiiiiii!S!!i 

H^HHHHHHH«-S;H 5 J 555 S 5525 SSiS 52 £ 5 f; 5 £S 5 « 5 ??. 5 ?. c * e .P e . c ®eeco 



2 Z 
^ <• 
2 2 



C C 
> > 



a. a 



o c 

lL Lb 



a a 



I fSlSflllSf rf ISilllf If f riff f lllflff IlSIf IflSi'Uf I 11 
IlllllllllllllllllllllllllllllllllllllllsglsIIIIglll 

Q.Q.Q.Q.Q.Q.CLa.Q.Q.Q.Q.Q.a.Q.Q.Q.Q.Q.Q. Q . Q . Q . Q . Q _ aa(XCLCLacL “~“““““““““~“~“““““ — 



^ ^ C* — C Ci C C C C C Cl Cl C- Cl Cl Cj C C Ci C C C C c C C C o c~i rr* r"\ 

J.' Uj ll L J. Li lL Jj lL lL LU Ll wL Li. J.' LL »Ju J. 'X J.' U. 1 J. • - » ■ ■ ■ . *T *7! 



cc c O C c 
i u. x Uj a. ll 

- • ■ — ■ — ■ — r— w— r— r— 

<<<<<<*<■< 



5 c c q c* c o 



' » T r ^ » » » — r— r— r— r— r— p— *— r— r— 



S« S» » S§gS 5 w 5 ? S o£S ®s 

(VOOC^5oCCCv-«if\C\.M«-«‘^tfS!rliASS^ < i^ * .P *0 X OtV^vO(\l'0^^oOCC'OaCCV!(\i^^ CO t*J K> IT> U> sO 

, , , , , , , , ^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ir> tr ir> it ir> ir> tr ir> it u> u> it. ir> lt it. 

u u u. a I UJ U ^ U' a, ui u u uj m u< a. u a. m u u- u. w ll. ll u a. u a in ui u ui in u 11 a. u. ui u. a a u u.- a, u u: a. it. a: in u. a u 

rrrrrsfsrlrllrglSIrlllrflillllllllllllllllllllilHIIII 

r—,r~ r~ < — r-' r~ r~ ^ _ _ _ 



G C. C C C C C. 
Lb LL LL) LL) Jj LL’ LL' 



uiui^uiu,uia:a.uiu;ajU)ttiaujujujij.iiJinaiii.a- 4 jaiUiU)UiU)ujujiIiiijiL£S£uia;£ujui£££ 



=> = =>=. = = s = = Ss = = = = = - ttttt'it ttE *- 1 

£££££££££§§§££§§££§§ §§§§§£§§g§§§§£g§§§|§g§§§§|c|||c£ce 



S 5 SRSgaSgSSfeS§S 5 SS 5 !S;SgSS 5 g 5 S 5 SSRSftfS? 8 SgR 5 ;ss 5 feSSSSg 
= = E = ft£i:££££i£i£ ££;£££££££££ 

innifWdinfiiiifittMr tf»v>v>i/:v>w>cr:v>v>v 3 (r. t/-v;v:a!/!££^I 7 £w££wwc*w!rwiflv:«r!/ 5 inww««t*£;££ 

S S£J 5 .ilC££^Si£Jf^sxS 5 ££^SS£:jlCS;£i;ssjSf-^;s- :j ;- J:£S j s 



Iff fff=.f=££fi:f=. 5 .i. 5 fS.S^f £i'i||||||iS 25 l 5 sSsS 55555 sS 5 g'Sg 55 

Ilf If 5 Iff I Is Iff f s;i si Ilf S3 IllfiSsI fill If f If III ff II I |f II 



£?£££***?!£ *-rZ®^^.^{C® r 'J?. ; ?-' f ^!r c ® a t^- rif ' o,, ‘ cc ' c ' t 0 T - ,rccfv ' r( ' a cccir. v,r^*«c 

Ifllllllllfllffllllfffflllftfl'apfsIltllfHMIlIlllilsl 

niHiHrlr.'ririnnrinr(rlrJ«»jlii^j C jC CC ‘-^CCCC'CCCC'CCC'CLCC*-! COCeCCeCCJe 

n^nnnrrni-rlrrlrir. Hp-Hr- __._L._i_: 



2 CC 
<1 •- — 
2 

U If 
- 7 7 
C < <r 

> 2 2 



CCJICCL'CCCCDC 



c c c c c c- 



C C C C C' C D 



tzzzszzz:z:: 255 zzzz 5 izz**Zzzzizi*szi£szsm &3 

»«? 22222 J 2 JSJJ 2 J 22 SJJ 2 J 22 J 2 J 2 J 2 jj -225 



2 2 

< < 
2 2 



<<<<<<<<< , 

T X z zzzzzxzzzz 



UUUUVUUUUUUUUUUUlflfU 

7 7 2 7 7 



19 



o 

ERIC 












i. 



r 



I 






I 



I 



I 



I 



f 



a 




Cluster Selection 



Once the "optimum" pivots have been determined, the selection of 
their clusters is a relatively simple process. All personnel to be 
clustered (i.e. , those other than pivots) are considered separately. 
Each individual is selected for membership in that cluster with whose 
pivot man he has the greatest similarity. An individual thus appears 
in only one cluster, excepting those instances in which his highest 
similarity is with two or more pivot men. In the latter case, such 
individuals appear in all clusters with whose pivots the tie occurs 
and also appear in a separate listing of ties. 



The output of the cluster selection program is in the form of a 
listing of clusters, a sample of which is shown on page 21. This 
listing gives the ID of the pivot man around which the cluster is 
formed, and then lists other cluster members by order of descending 
similarity to the pivot. The cluster member's ID, his index of simi- 
larity with the pivot man, and his variance constitute each line entry 
in the cluster listing. Appendix E contains some examples of cluster 
listings for each of the three engineering task areas. 
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Partial Cluster Selection Listing 
( Propuls ion/ Auxiliary Task Area) 
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III. 



ANALYSIS OF CLUSTERING TECHNIQUES 



/ 



There are several considerations which emerge from the preceding 
discussion of computer techniques for clustering tasks. First, in 
using methods which employ a "pivotal" task pattern as the reference 
point for grouping similar task patterns, the selection of those pivots 
is critically important. Second, the particular technique used to 
cluster task patterns around a "core" can vary, depending on the cri- 
teria used to evaluate clusters and the particular research objectives 
involved. Third, the development of "optimum specialty clusters" 
necessitates some procedure for regulating the size and homogeneity 
of clusters. 

Each of these three problem areas was examined in detail in the 
process of developing techniques for the analysis of task patterns. 

The procedures employed in this analysis and the determinations re- 
sulting from it are contained in the following discussion. 

A. Effects of Differential Pivot Selection 



The primary criterion for the selection of pivots in this research, 
regardless of the specific technique employed, has been the magnitude 
of an individual's variance. Thus, for a given task area, those in- 
dividuals who possessed a high variance of task pattern similarities 
were more likely to become pivots than those with lower variances. By 
comparing the initial pivot selection technique with that employed in 
the pivot optimization program, the more effective method for optimiz- 
ing the selection of pivots can be determined. Table 3 shows the re- 
sults of this comparison in an abbreviated list of pivots produced by 
the two programs . 

Of the two methods of pivot selection in Table 3» note that the 
initial pivot selection technique is shown under four different condi- 
tions; that is, pivots were selected with similarity thresholds set at 
25, 21, 19, and 1 6 . Although the ST is used primarily to regulate 
entry into clusters, it also affects the number and kinds of pivots 
selected — see Appendix C. 
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Under the four conditions, the range of variance among pivots Pi 
to Pi 5 runs from 97-^ » 97~36, 97-31 , and 97-21, respectively. It can 
be seen from Table 3 that as the ST is lowered, so is the variance of 
P n . But even with ST = 25 (a relatively high threshold), Pi 5 has a 
variance of only kk. 



In contrast to the initial method of selecting pivots, the pivot 
optimization technique results in a single list of pivots in which the 
variance is maximized for each — while still producing pivots that are 
mutually distinctive in their task patterns. Although the threshold 
is set at ST = 1 (which, in effect, sets no restriction on cluster 
membership), Table 3 shows the range of pivot variances to be 97-62. 








TABLE 3 



Comparison of Two Pivot Selection Techniques 
as Applied to the Propuls ion/ Auxiliary Task Area 



Order of 
Selection 


Initial Pivot 


Selection Technique 




Pivot 

Optimization 

Technique 


ST =2 5 




ST=21 




ST=19 




ST=l6 




ST-1 






ID Code 


s 2 


ID Code 


s 2 


ID Code 


s 2 


ID Code 


s 2 


ID Code 


s 2 


p i 


72242 


97 


72242 


97 


72242 


97 


72242 


97 


72242 


97 


P 2 


62026 


97 


62026 


97 


62026 


97 


62026 


97 


62033 


90 


P 3 


52211 


83 


52211 


83 


52234 


80 


72245 


64 


52215 


62 


P 4 


82028 


75 


82201 


56 


82016 


49 


52217 


53 


82224 


69 


P 5 


72245 


64 


92022 


53 


52409 


44 


52409 


44 


92021 


82 


P 6 


82201 


56 


82228 


49 
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43 
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43 


02218 


82 


P 7 


02030 


54 


62206 


45 


52218 


4i 


52006 


38 


82028 


75 


P 8 


82009 


53 


52409 


44 
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38 


92205 


31 


72249 


80 


P 9 
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50 
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44 


62017 


36 


52227 


31 


52234 


80 


P 10 


72251 


49 
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4i 


72237 


35 
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30 
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68 


P 11 


02224 


47 


52218 


4l 
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33 


02026 


29 
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75 


P 12 


92033 


45 
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39 


92034 


33 


72257 


26 


92031 


78 


P 1 3 
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45 
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38 


02210 


32 


62013 


23 


52014 


87 


P 1 4 


52409 


44 


62021 


38 


92205 


31 


82024 


22 


92010 


82 


P 1 5 


92226 


44 


62017 


36 


52227 


31 


92225 


21 


92023 


68 
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In fact, few of the pivots produced by the initial method (regardless 
of the ST) are even listed in the first 15 pivots selected by the 
optimization method. The critical feature of the latter technique is 
the avoidance of the problem illustrated in Figure 2 ( supra , p. 17) » 
whereby potential pivots are lost through inclusion in preceding 
clusters . 



B. Effects of Differential Cluster Formation 



Aside from the particular method used to select pivots, the appli- 
cation of the two clustering techniques results in different cluster 
effects. In the initial cluster program, individuals are grouped into 
clusters when their similarity to a pivot exceeds a stated minimum. 

In contrast, the cluster selection technique associated with the pivot 
optimization program produces clusters by grouping individuals together 
by their highest similarity to a given pivot. 



The resulting clusters produced by these two techniques differ in 
one important respect. In the initial cluster program, a sizable 
number of personnel appear in more than one cluster because their 
similarity to a number of pivots exceeds the threshold. For example, 
in the Propulsion/Auxiliary task area these multiple memberships com- 
prise between ^k% and 72% of the sample — depending on the particular 
similarity threshold set for the cluster run. Multiple memberships 
constitute a factor which frequently has a negative effect on cluster 
homogeneity. This is due to the introduction of heterogeneous segments 
of task patterns into more than one cluster. 



Conversely, the cluster selection program clusters by reference to 
an individual's highest similarity and, as a result, individuals gen- 
erally appear in a single cluster. The only exception occurs when an 
individual's highest similarity relates to more than one pivot. In 
order to understand the differential effect of these two methods of 
cluster formation, an analysis of cluster homogeneity was undertaken. 



The evaluation of a set of clusters is accomplished by reference to 
the criterion of homogeneity. "Optimum specialty clusters" are those 
which maximize task pattern homogeneity within a cluster. Since clus- 
ters are formed by the relationship of an individual's similarity to a 
pivo t , there is no assurance that this relationship will automatically 
result in high similarity among all personnel in a given cluster. In 
order to maximize the criterion of homogeneity, a computer program 
called the Cluster Verification routine was developed. This program 
employs an input of individual task patterns in a given cluster, gen- 
erates an intra-cluster similarity matrix, and produces an output which 
shows the mean task pattern similarity of the entire cluster (cluster 
verification score or CVS) and the standard deviation. 



It also identifies each individual in the cluster by code, the mean 
similarity of each individual's similarities with all other cluster 
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members (vector verification score or WS), and the standard deviation. 
Appendix F contains an example of the computer output for one cluster 
in the Electrical task area. 

Using the verification scores (CVS) to measure cluster homogeneity, 
different cluster arrangements produced by the two computer programs 
can be compared and evaluated. Table 4 indicates the various CVS values 
for five clusters in the Electrical area. These clusters , produced by 
the two clustering techniques, employ identical pairs of pivots and 
identical thresholds. By holding the pivot factor and threshold factor 
constant, the effect of multiple memberships can be examined in isola- 
tion. 

Table 4 shows the differences in homogeneity of clusters produced 
by the two clustering techniques under varying conditions. In most 
cases, the effect of multiple memberships has been the dilution of 
cluster homogeneity. For instance, with the initial cluster program 
run at ST=25 , the five clusters show mean similarities (CVS) of 29*90, 
28.82, 27. 55 s 27*93j and 28.67. On the other hand, the five clusters 
produced by the cluster selection program at the same threshold (i.e., 
25) show consistently higher CVS values of 34.98, 30.79, 28.11, 28.31, 
and 28.73* For some clusters (e.g., C5) the increase in homogeneity 
is minimal, but for others (e.g., Cj) it is fairly large. Aside from 
the multiple memberships in the ’’initial" clusters , these task group- 
ings are identical. 

It is interesting to note that with the cluster selection program 
set at ST=1 (where 100$ of the respondents are clustered) , and the 
initial program set at ST=25 (where only 75$ are clustered) , the 
homogeneity of one "optimization" cluster (i.e., C4) is still greater 
than its counterpart, and another (i.e., Ci) is quite similar. Thus, 
even with no. effective threshold, the cluster selection technique 
sometimes produces greater homogeneity than the initial technique with 
a threshold. 

When higher threshold runs are compared, there appears to be little 
difference between the two clustering techniques. However, at those 
thresholds (i.e., ST=27 , 30, or 33) the number of personnel that are 
clustered is small. At ST=33» for instance, only 44$ are clustered — 
compared with 75$ at ST=25. 

C. Effects of Differential Threshold Regulation 

Regardless of the method employed in either pivot selection or 
cluster formation, the extent of homogeneity in a cluster will ulti- 
mately depend on the "entry level" established for the particular 
cluster. The entry level is a designated value which defines the 
minimum level of similarity required for inclusion in a cluster. 

In the initial cluster program, the entry level is stated in terms 
of a similarity threshold (ST) which regulates the entry of personnel 
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Comparison of Two Cluster Formation Techniques 
as Applied to the Electrical Task Area 
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into a cluster by their similarity to the pivot. By increasing the ST, 
and thus making entrance to a cluster more restrictive, the homogeneity 
of a cluster is also raised. However, because the more restrictive 
cluster entrance requirement necessarily excludes more personnel, every 
increase in the ST results in an increased number of unclustered per- 
sonnel. Thus, a trade-off in improved cluster homogeneity requires the 
exclusion of a sizable portion of the sample of task patterns. 

Aside from the similarity threshold as a method of cluster regula- 
tion, there is a different kind of entry level that might be used in 
maximizing the homogeneity function of clusters. The latter is obtained 
from a cluster verification listing (see Appendix F) which shows the 
mean similarity of the cluster as a whole (CVS), but also shows the mean 
similarity of each cluster member’s relationship with all other members 
(VVS). The relative effectiveness of these two types of threshold is 
shown in Table 5* 




i| 



Table 5 contains the components of a single cluster: listed thereon 

are the identification codes, the similarity of each individual with the 
pivot of cluster [S(i;p)], and the mean similarity (WS) of each cluster 
member's task pattern relationships. Employing the ST method of regu- 
lating the size and homogeneity of clusters , the thresholds were set at 
ST = 34, ST = 28, and ST = l4 — yielding clusters with m = 10, m = 15 » 
and m = 20, respectively. The cluster verification scores (CVS) for 
these potential clusters, as well as the total, are listed at the 
bottom of the table. 
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With the same size clusters, thresholds were set by the mean simi- 
larity of each individual's vector of similarities (WS). The CVS 
scores for clusters set at those thresholds (i.e. , WS = 25* 23, and 13) 
are also listed at the bottom of the table. 

For the complete cluster (m « 21 ) , the CVS is necessarily identical — 
because the cluster membership is identical. Similarly, the same 
CVS is obtained for both cluster regulation methods at m = 15; again, 
because of identical memberships. However, for the most restrictive 
threshold (m = 10), the homogeneity of the cluster is greater when 
using the mean vector similarity (WS) as a threshold than by using 
the similarity to the pivot (ST). Similarly, the same result emerges 
when the two clusters of m = 20 are compared. 

Based on this analysis, the results indicate that the size of a 
cluster and its homogeneity can, in some cases, be optimized by employ- 
ing mean similarity, rather than similarity to the pivot, as the method 
of regulating clusters. However, the difference in results produced by 
the two techniques is not so great as those shown between the two pivot 
selection techniques and the two methods of cluster formation. 
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TABLE 5 

Comparison of Two Threshold Regulation Techniques 
As Applied to a Cluster in the Electrical Task Area 



Identification Code 
(Arrayed by 
Similarity 
to the Pivot) 


Similarity 
With Pivot 
[S( i ,p) ] 


Identification Code 
(Arrayed by Mean 
Similarity to All 
Cluster Members) 


Mean Similarity 
of Each 
Cluster Member 
[WS] 


52406 (pivot) 




52406 (pivot) 


30 


02810 




44 


72420 


30 


77420 




4i 


02810 


29 


62428 




38 


62428 


27 


72418 




36 


72418 


26 


52424 




36 


52424 


26 


82415 




35 


62422 


26 


62430 




35 


62430 


25 


62422 




35 


02804 


25 


82418 


m=10 


34 


82427 m= 


10 2£ 


924i6 




33 


62433 


24 


02804 




31 


82415 


23 


82427 




31 


824l8 


23 


52425 




31 


924i6 


23 


62433 


m=15 


28 


52425 m=15 2£ 


92208 




26 


92208 


23 


62413 




22 


62413 


19 


92406 




20 


92406 


17 


82416 




18 


824l6 


i4 


02811 


m=20 


14 


72434 m=*20 13 


72434 




13 


02811 


11 


Cluster 


Mean Similarity 


Cluster 


Mean Similarity 


Designation 


of 


Cluster (CVS) 


Designation 


of Cluster (CVS) 


m=10 




32.00 


m=10 


32.60 


m =15 




29.30 


m=15 


29.30 


m=20 




23.99 


m=20 


24.23 


m=21 




22.86 

— 8 


m=21 


22.86 



D. Summary 



The preceding sections have emphasized the three stages in develop- 
ing clusters; namely, (l) the selection of optimum pivots, (2) the 
formation of clusters around pivots, and (3) the regulation of size 
and homogeneity of clusters by a threshold. For each of these processes , 
two techniques have been compared. 

In the case of pivot selection, the initial pivot program and the 
pivot optimization program were analyzed in terms of their respective 
output. The latter technique was found to be the more effective in 
maximizing the variance of pivots, while still maintaining the task 
pattern distinctions among pivots. 

The two methods of forming clusters were compared in terms of the 
membership of clusters and their homogeneity. Of the two programs, the 
cluster selection technique was found to contribute more to cluster 
homogeneity, through the avoidance of multiple memberships, than the 
initial program. 

In considering techniques for regulating clusters, the similarity 
threshold (ST) contributes somewhat less to cluster homogeneity than 
the threshold derived from mean vector similarities (VVS). Because 
the regulation of clusters through manipulation of thresholds is so 
important in developing homogeneous clusters of work requirements, a 
more detailed analysis of this area was conducted In terms of the 
Unified Cluster System (UCS) — elaborated in the following section. 
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IV. UNIFIED CLUSTER SYSTEM 



In analyzing the effects of differential pivot selection and cluster 
formation, it was possible to develop a "unified" computer program which 
could — in a single run — produce most of the desired outputs necessary to 
formulate decisions regarding the size and number of optimum specialty 
clusters in a given occupational field. To accomplish this end, a 
series of computer programs was integrated in a single "package" de- 
signed to provide the data necessary for a comprehensive analysis ; this 
integrated group of programs was designated the "Unified Cluster System" 
or UCS. 




Input for the UCS consists of a magnetic tape containing the original 
matrices of task pattern similarities derived from the deck of cards pro- 
duced by responses on the task list questionnaires. Output is comprised 
of printouts that were previously the result of separate computer runs. 
These outputs include a variance listing, pivot optimization listing, 
printout of "ties," cluster listing, cluster verification listing, and 
an output of punch cards containing the task patterns of respondents 
arranged by cluster in the same form as the cluster listing (see 
Figure 3 for processing procedures). The unique feature of UCS is its 
capability of producing multiple runs with card output. 

Instead of clustering all respondents' task patterns around a pre- 
determined number of pivots judged to be appropriate for a given task 
area, the UCS contains an iterative procedure for fixing the number of 
clusters. This technique groups all personnel into two clusters, then 
produces a complete UCS output package. The program then recycles and 
groups the task patterns by their respective similarity to three pivots — 
again, with the appropriate output. Each time the process is iterated, 
it adds a pivot from the pivot optimization listing in preferential 
order. Thus, the program results in a series of cluster sets; the first 
set containing two clusters, the second set three clusters, the third 
set four clusters, and so on until the pivot optimization list is ex- 
hausted. With this output, an occupational area can be evaluated in 
terms of two or more optimum specialty clusters, without initial deci- 
sions as to the optimum number and size of clusters. 

Although each set includes the pivots of the preceding set, the 
addition of each new pivot causes the successive iterations to form 
different task patterns. This is because the personnel are redistrib- 
uted in terms of their highest similarity to a pivot. As a result, 
when the members of two clusters are presented with a third pivot for 
comparison of task pattern similarity, that third pivot will usually 
attract some marginal members of the initial clusters. Each time the 
program recycles , the same pivot will frequently attract a somewhat 
different constellation of cluster members. Thus, the first pivot 
selected (Pj) will provide the basis for a maximum of k different 
clusters (k = number of iterations or sets); and the terminal pivot 
.(Pi-) necessarily attracts a single cluster. 
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FIGURE 3« Computer Processing Procedures in the UCS 
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A. Designation of Specialty Clusters 

In order to isolate the specialty clusters in an occupational area, 
a few limitations must be imposed on the process of cluster analysis. 
First, the sii :f the sample in a task area dictates the upper and 
lower limits for a cluster in that area. For example, in this research 
it did not appear feasible to employ clusters of less than ten respon- 
dents. The description of clusters in terms of the technical, organi- 
zational, and communicational variables would not be statistically 
meaningful with very small clusters because of the paucity of data. 
Similarly, excessively large clusters would exhaust most of the sample 
in a particular task area, leaving few respondents as a source of data 
to describe other clusters in the area. As a result, the particular 
constraints of size in this occupational sample were set within the 
flexible limits of between 10 and 50 personnel. The clusters which 
emerged from UCS did not indicate that these constraints posed a sig- 
nificant limitation on the process of cluster analysis. 

A second constraint in designating specialty clusters involves 
threshold regulation. The UCS, unlike the initial cluster program, 
clusters all personnel in the sample according to their highest simi- 
larity to a pivot. Because of this, there are a number of respondents' 
task patterns that do not adhere closely to any pivot, but are never- 
theless included in those clusters to whose pivot they are most similar. 
These personnel have marginal or deviate task patterns because: (l) 

they were new arrivals on board ship at the time of sampling (and thus 
performed an erratic and incomplete list of tasks); (2) they did not 
complete the task list questionnaire; (3) the questionnaire was im- 
properly filled out; ( 4 ) the survey instructions were misunderstood; 
or (5) simply because their task patterns were relatively unique on 
the particular ship(s) sampled. Whatever the reason, the task patterns 
associated with these personnel detract from cluster homogeneity to a 
significant degree. It is the precise purpose of the similarity 
threshold (ST) to eliminate such deviant cases, providing that cluster 
similarity is not promoted at the expense of a sizable portion of the 
s ample . 

In light of the constraints discussed above, the initial step in 
designating specialty clusters involves setting thresholds on all 
clusters produced in the three task areas by the UCS program. This 
process depends in part on the judgment of the research staff in 
analyzing the UCS output cluster by cluster. The procedure employed 
is identical for every cluster, so that reference to one example will 
suffice to describe the process used for all clusters. 
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The following page contains a partial UCS printout of a cluster 
listing. Identification codes for the pivot and all cluster members 
are shown, along with each individual's similarity index ordered from 
high to low. To eliminate marginal cluster members , one proceeds from 
the top of the list and skips the first ten indices (which represent 
the minimum limit on cluster size). Continuing down the list, note 
that the similarity indices are sequentially continuous until one 
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reaches those individuals with an index of 29. Thereafter, begins a 
series of gaps starting with a space of seven between the continuous 
similarities of 29 and the index of 22 — as indicated by the arrow. 

If the last three individuals with low similarities were included in 
this cluster, it would dilute the homogeneity of the cluster dispro- 
portionately. 

It is the identification of the significant interstice in a series 
of similarity indices that depends on the judgment of the researcher— 
although it is not so arbitrary as it may appear. If cluster verifica- 
tion scores (CVS) were computed for this cluster, starting with the 
initial ten indices and adding one additional individual each time, 
the first large drop in cluster homogeneity would appear at the same 
point (i.e., between 29 and 22) identified in this example. 

In an identical manner , thresholds were set for each cluster to 
eliminate marginal contributors to cluster homogeneity. Verification 
scores (CVS) were then computed for all "refined" UCS clusters using 
the Cluster Verification routine discussed in a previous section 

( supra , pp. 25 , 26 ). 



The next step in designating optimum specialty clusters is involved 
with the decision as to which set of clusters produced by UCS (and re- 
fined by setting thresholds) are to represent the homogeneous segments 
of work that are characteristic of an occupational area. Each iteration 
of the UCS produced a set of clusters utilizing the entire sample in a 
task area; thus some choice must be made among the k sets of clusters. 
Table 6 shows a partial array of cluster sets from the Propulsion/ 
Auxiliary task area, beginning with three and terminating with fourteen 
clusters. The first two columns contain the identification code of the 
pivots (e.g., "72242") and their cluster number (e.g., "Ci"); the next 
and all succeeding columns, each contain a set of clusters showing the 
size of each cluster (m) in that set as well as the degree of internal 
homogeneity (as determined by the measure of mean similarity provided 
by CVS computations). 

In selecting an optimum set of clusters to represent a given task 
area, there are a series of criteria which can be used to delimit the 
scope of the problem. Thus, the object in making a choice among alterna- 
tive sets produced by UCS is to (l) maximize cluster homogeneity, (2) 
maximize the number of clusters representing the task area, (3) maximize 
the number of personnel (i.e., task patterns) accounted for within the 
bounds of the similarity thresholds, and (4) minimize the number of 
clusters that exceed the size constraints of 10 to 50. 

Initially, half of the sets listed in Table 6 can be eliminated from 
consideration because they clearly exceed the criteria noted above. 

That is, of the 12 sets shown, six sets (S^ S 2 , S 9 , S 10 , S n , and S 12 ) 
can be excluded because of the relatively small number of personnel 
accounted for in clustering and/or because of the relatively large 
number of clusters invalidated by exceeding the size constraints (i.e., 
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TABLE 6 



Summary Array of Partial UCS Output 
for the Propuls ion/Auxiliary Task Area 



Cluster 

Number 


Pivot 

Identif. 




S 1 


s 2 


S 3 


S4 


S 5 


Set 

s 6 


Number 

S 7 


s 0 


S 9 


BlO 


Sn 


s 12 


Cl 


72242 


m 

CVS 


65 

22.4 


59 

23.9 


47 

25.0 


47 

25.0 


43 

25.4 


43 

25.4 


43 

25.4 


43 

25.4 


32 

29.4 


30 

29.5 


VO 
CO • 
CVJ cr\ 
CVJ 


24 

30.9 


c 2 


62033 


m 

CVS 


45 

21-5 


43 

22.2 


23 

25.6 


23 

25.6 


22 

25.9 


21 

25.9 


21 

25.9 


19 

26.2 


19 

26.2 


19 

26.2 


19 

26.2 


19 

26.2 


C 3 


52215 


m 

CVS 


65 

21.4 


50 

21.2 


50 

21.2 


36 

21.2 


42 

20.3 


42 

20.3 


32 

20.6 


32 

20.6 


23 

22.5 


23 

22.5 


23 

22.5 


23 

22.5 


c 4 


82221* 


m 

CVS 




39 

21.7 


51 

17.7 


21 

20.6 


21 

20.6 


21 

20.6 


16 

21.3 


16 

21.3 


16 

21.3 


16 

21.3 


16 

21.3 


16 

21.3 


C 5 


92021 


m 

CVS 






31 

23.8 


31 

23.8 


25 

24.3 


23 

24.5 


23 

24.5 


22 

24.6 


17 

23.8 


17 

23.8 


15 

23.2 


15 

23.2 


C 6 


02218 


m 

CVS 








36 

22.4 


36 

22.4 


36 

22.4 


21 

> 22.4 


20 

23.2 


17 

25.7 


17 

25.7 


17 

25.7 


17 

25.7 


C 7 


82025 


m 

CVS 










20 

20.6 


19 

20.8 


19 

20.8 


19 

20.8 


15 

23.8 


11 

25.0 


11 

25.0 


9 

25.1 


C S 


7221*9 


m 

CVS 












9 

19.1 


9 

19.1 


9 

19-1 


7 

20.6 


6 

24.2 


6 

24.2 


6 

24.2 


c 9 


52231* 


m 

CVS 














35 

21.7 


35 

21.7 


35 

21.7 


35 

21.7 


26 

23.9 


26 

23.9 


Cio 


62015 


rn 

CVS 
















6 

24.7 


5 

26.2 


4 

32.3 


4 

32.3 


4 

32.3 


Cn 


72208 


ra 

CVS 


















9 

29.3 


9 

29.3 


8 

28.8 


8 

28.8 


Cl 2 


92031 


m 

CVS 




















8 

25.6 


8 

25.6 


7 

26.2 


Cl 3 


52011* 


m 

CVS 






















7 

33.2 


6 

34.1 


C 1 4 


92010 


m 

CVS 
























9 

28.0 


Number of Clusters 




3 


4 


5 


6 


7 


8 


9 


10 


11 


12 


13 


14 


No. Personnel Clustered 


175 


191 


202 


194 


209 


214 


219 


221 


195 


195 


188 


189 


No. Clusters <10 and 


>50 


2 


1 


1 


0 


0 


1 


1 


2 


3 


4 


5 


7 
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: 










clusters which are <10 and >50). Of the remaining six sets, S3 can be 
excluded because of the small number of clusters (i.e., 5 ) > and S4, 
because of the relatively small number of personnel clustered (i.e., 
19^). S 8 , although containing the highest number of clustered individ- 

uals (22l), has two clusters of less than 10 respondents each. In the 
three sets left, there is little to choose in the way of cluster homo- 
geneity among those clusters that can be commonly compared (i.e., Cj to 
C7). Therefore, the final selection must be made on the basis of number 
of clusters in a set and the number of personnel clustered. On both 
criteria, S7 "optimizes” the choice — even though one cluster in that set 
is slightly undersized (Cgj where m=9). As a final check on this 
process, a computer program was developed to analyze internal cluster 
homogeneity in. terms of the task pattern similarity between clusters. 



B. Evaluation of Cluster Similarity Distance 



In order to evaluate the task pattern differences between clusters, 
a computer program was developed to build a matrix of similarities 
parallel to the similarity matrix used for Cluster Verification (CVS); 
the output of which provides measures of inter-cluster distance. This 
is done by computing the mean value of all cells in the task pattern 
similarity matrix of two clusters. These values (termed Cluster Dis- 
tance Scores or CDS) indicate the extent to which the clusters in a 
set, taken two at a time, are discrete or similar. Ideally, the differ- 
ence in task patterns between clusters should be significantly greater 
than the difference in task patterns within clusters. Since the CVS 
and CDS are identical in terms of computational procedures, a direct 
comparison is possible. Table 7 contains a matrix of inter-cluster 
similarities for eight Propuls ion/Auxili ary area clusters in set seven 
(although there were nine UCS clusters listed for S7, the smallest 
cluster [ C q ] was eliminated because of its low homogeneity and inade- 
quate size). 



From an analysis of this matrix, it is possible to evaluate the 
cluster set to determine which clusters are most similar and which are 
most discrete. It is not the absolute cluster distance score (CDS) 
that is important in this evaluation; instead, it is the size of the 
CDS relative to the internal homogeneity (CVS) of the two clusters be- 
ing compared. In all comparisons, the CDS should be smaller than the 
mean similarity of either of the two clusters which make up the simi- 
larity distance matrix. If this were not the case (i.e., if the mean 
similarity between clusters were greater than that within clusters), 
the rationale for maintaining separate clusters would collapse. 

Table 7 shows there are no exceptions to this research expectation. 
Thus, there are more differences in task patterns between clusters 
than within clusters. 



In some cluster pairings, the CDS indicates wide disparities in the 
work performed [e.g., between pairs (C2JC3) ;(C2 jCi+) ;(C 2 ,Ce) ;(C 2,C8 ) ; 
(C4JC5 ); and (Ci+,C7)]. Of the 28 CDS cell entries for the eight 
Propulsion/Auxiliary task area clusters, the six lowest values are 
related to C2 pairings and C4 pairings. Conversely, of the nine most 
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TABLE T 

Cluster Distance Matrix for Eight Clusters 
In the Propuls ion/Auxiliary Task Area 



Cluster 

Number 


m 


Intr a -Cluster 
Similarity 
(CVS) 


Inter-Cluster Similarity (CDS) 

C 2 C 3 C 4 C 5 C 6 C 7 Ca 


Cl 


43 


25.4 


11.4 9.6 11.7 18.0 13.2 19 . e 12.0 


C 2 


21 


25.9 


8.8 3.4 21.0 6.0 15.4 7.1 


c 3 


32 


20.6 


11.7 11-7 17-3 9-7 18.2 


C 4 


16 


21.3 


7-1 19-6 7.1 17-0 


c 5 


23 


24.5 


10.6 19.0 10.9 


c 6 


21 


22.4 


9-7 20.0 


C 7 


19 


20.8 


10.0 


c 8 


35 


21.7 
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similar cluster pairings, three clusters ( C 5 , Cg, and Cg) account for 1 

half of the high CDS values. It is interesting to note that with one j 

exception — i.e., (Cg,Cg) — the similarity of these three clusters among 1 

themselves is not particularly great. ! 

sj 

With the designation of optimum specialty clusters noted previously, 1 

and the aid of output from the cluster distance program, it then becomes jj 

possible to describe an occupational field or task area in terms of its j 

task pattern interaction. The relationships between relatively homo- J 

geneous segments of work requirements can be best illustrated in an j 

n-dimensional space — which, unfortunately, is impossible in the planar j 

surface of this report. Nevertheless, Table 7 does indicate the con- j 

stituents of some of these relationships. For instance, a macro-cluster 
can be developed from C4, Cg , and Cs — all of which have a considerable j 

amount of mutual task pattern similarities. On the other hand, C2 ap- ] 

pears to be relatively independent of all other clusters except C5. j 

The task pattern relationships described above are influenced to a j 

very large degree by the source of the data. Inasmuch as the task pat- | 

terns were derived from engineering personnel on destroyers, the simi- I 

larities in tasks performed between individuals and between clusters j 

of nominally different occupational areas are much greater than would j 

be the case for other ship types or other work situations (e.g., indus- j 

trial occupations), where the division of labor and specialization of j 

functions are more prominent. Destroyers are generally characterized j 

by jobs which evidence a large amount of overlapping in task patterns. 

Because of this, the specialty clusters produced from a matrix of task j 

pattern similarities reflect this relative lack of specialization and | 

are much more difficult to separate clearly. However, this does not 

invalidate the clustering process; the clusters produced by these 

techniques simply reflect the way in which tasks are performed in a 

specific work situation. i 

i 

j 
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V. RESEARCH APPLICATIONS OF COMPUTER CLUSTERING TECHNIQUES 
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The primary application for computer clustering techniques in this 
research is in the area of task analysis. All of the data processing 
decisions and program designs have been directed toward the development 
of optimum specialty clusters. These clusters, which constitute groups 
of homogeneous task patterns, will be characterized by a series of 
technical, organizational, and communicational variables. By this 
process, clusters of work requirements will be developed — each cluster 
reflecting a particular profile of skills and knowledges. 

Computer clustering techniques are not limited to task analysis 
alone. For instance, in the same research, the series of programs 
associated with UCS is being employed to determine existing patterns of 
communications networks in destroyers. With only minor modifications, 
these same clustering programs will employ an input of "contact lists” 
to produce clusters of communications patterns. A considerable amount 
of the work in this area has been heretofore limited to experimental 
networks of three to seven persons in a laboratory setting. With the 
advent of more advanced and sophisticated techniques, such as UCS, it 
becomes possible to test hypotheses about occupational and organiza- 
tional behavior in actual shipboard situations. These homogeneous 
patterns of work contacts will be contrasted with "official” designa- 
tions of organizational structure and formal work group arrangements, 
to determine cases of deviation and the circumstances under which such 
deviation occurs. 

Methods of computer clustering can be adapted to a wide range of 
research problems, in addition to the above. Problems of uni dimensional 
pattern recognition are especially suitable for UCS solution. In par- 
ticular, this assortment of clustering programs provides quantitative 
criteria for research decisions that are frequently arbitrary, or based 
on "estimates,” in other research techniques. 
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This bibliography contains a selection of books, professional arti- 
cles, and other publications which focus on the problem of defining, 
describing, measuring, and recognizing groupings of entities. In the 
behavioral sciences, this interest would focus upon one or more common 
features of human groups or patterns of human behavior. But the tech- 
niques employed to classify, group, or cluster humans on the basis of 
some criterion of similarity are not necessarily different in kind from 
those techniques used on the same type of problem by physicists, mathe- 
maticians, computer designers, information theorists, and electronic 
engineers. Unfortunately, there appears to be relatively little inter- 
action on the part of scientists from diverse disciplines who, never- 
theless, are concerned with similar technical problems. 

The selection of publications which follows represents an attempt 
to bring together some of the wide variety of literature concerned with 
cluster analysis, pattern recognition, hierarchical grouping, factor 
analysis, profile grouping, and other clustering, classifying, and 
taxonomic techniques. Chronologically, only 25 percent of the items 
listed were published prior to i960, and there are no items dated be- 
fore 1949-50. Thus, the emphasis in this bibliography has been on the 
currency of research. Further, the stress is on statistical tech- 
niques — particularly those employing computerized procedures — rather 
than non- quantitative methods of analysis. Most of the entries in this 
bibliography have been reviewed in the course of developing the UCS 
technique. However, there are a number of items which are still un- 
evaluated in terms of the research problem concerned in this report 

their inclusion is based on the possibility of stimulating greater 
inter-disciplinary exchange than now exists. 
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APPENDIX A 



Task List Questionnaire 



TASK LIST INSTRUCTIONS 
for 

HULL AND REPAIR AREA 



1. The Task List on the following pages should he filled out only 




4 . 





~ w — ^xuwucx wwiung in the Hull and Repair area of the 

Engineering Department. This includes DC’s, SF’s, MR's, and 

strikers for these ratings. It also includes personnel of other 
ratings assigned to this area. 

The Task List is divided into 11 subject headings.* Read the 
subject heading first to determine if the heading applies to 
your present work area. 

-If it applies to your work, then read each task below the 
heading and make an "X" on the line following each task 
if you have actually performed the task in your prese nt 
.assig nment on tuis ship within the pas t 3 months. 

-If the heading does not apply to your work, go on to the 
next subject heading. 

Many of the tasks contain several different parts. Be sure to 
check the task if you perform any of the parts, even though you 
do not perform all the parts. 

Remember - 

-Do NOT check any tasks just because you "know how" to do 
them, or because you did them in school or in past duty 
assignments. 



-Do NOT check tasks which, during the past 3 months, you 
have supervised only. 



-Do NOT check tasks when you give only minor assistance , 
such as handing parts or tools to another man who is 
actually performing the task. 



Do not hesitate to ask questions if you need assistance. 













ra 



B. 



13 ♦ Perform angular, compound, and differential indexing; cut 
spur gears, T-slots and dovetails using milling machine. 

14. Perform spline cutting and broaching; cut spur, bevel, 
helical and worm gears using milling machine. 

15. Perform balancing machine operations. 

lo. Stow, lubricate, adjust, and clean shop equipment, 
machines and tools . 

17. Lubricate machine tool bearings, guide -rollers, fittings 
and designated parts; fill oil holes and oil cups; and 
change oil. 

18. Clean exposed surfaces of all machines and tools. 

19. Check and adjust leveling of machine foundations. 

20. Perform machining operations using lathe grinding 
attachments and milling attachments. 



PIPEFITTING (Plumbing, Steamfitting, Pipe Covering, Piping 

and Valve Work) 

1. Make temporary repairs to pipe with plugs, clamps, plastic, 
or patches . 

2. Make permanent repairs to pipes with plugs (rivet or 
screws), welded or brazed patches, or by straightening 
and aligning. 

3. Replace piping sections and fittings. 

4. Layout and assemble sections of piping using templates 
and targets, pipe bending machines, and cutting -burr ing- 
threading machines. 

5. Hydrostatically test pipes, tubes, valves and fittings. 

6. Clean and flush piping and plumbing lines . 



13. 



14. 



15 . 



16. 



17-. 
18. _ 
19*. 

20 . 



1. 



7 . 

8 . 



Determine cause of troubles in flushing and firemain 
systems . 

Install, patch and repair pipe lagging and insulation, and 
molded pipe covering on steam, water and refrigeration 
lines . 



2._ 

3.. 

k. m 

5*. 

6. 

7. 
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o * 
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APPENDIX B 

Initial Cluster Program Output 

Part 1. Partial Similarity Listing and Variance Listing 

(Hull/Repair Task Area) 
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Part 2. Semi -Matrix (Hull/Repair Task Area) 
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Part 3 . Frequency Distribution of Similarity Indices 
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APPENDIX C 



Cluster Identification Analysis 



The examination of potential specialty clusters in the initial 
program was accomplished with the aid of a special program labelled 
"cluster identification." This program arrays the cluster data in 
a table which facilitates visual examination of the structural char- 
acteristics of different clusters. This table shows the identifica- 
tion code of cluster members, their respective variances, their 
presence in one or more clusters and their similarity to the pivot 
in those clusters, and their status (whether clustered or unclus- 
tered, pivot or non-pivot). By comparing a series of these cluster 
identification tables and calculating a few summary statistics, the 
analysis of specialty clusters can proceed more effectively. 



Cluster identification tables were computed for different simi- 
larity thresholds in the three task areas. Those computer runs in 
which the control percentage was set at 10% for each of the three 
task areas are shown on pages 6^-6 Table 8 contains a summary of 
cluster identification data for 12 experimental cluster runs. 



There are a number of observations that can be made through 
analysis of Table 8. First, it is clear that as the control per- 
centage (CP) increases, the similarity threshold (ST) decreases. 

The reason for this is based on the method of obtaining thresholds 
by using the similarity distribution, as noted previously. Note 
also, that the range in variances between the first cluster's pivot 
(Pj) and the last cluster's pivot (Pt) in a given run increases as 
the ST decreases. Thus, in the Electrical area, the control per- 
centage set at 5% yields an ST=33 and a pivot range of 173-97 » 
while the percentage set at 20% yields an ST=25 and a pivot range 
of 173-^41. In terms of the criteria for selection of specialty 
clusters , the higher similarity thresholds result in greater homo- 
geneity in each cluster because of the more restrictive cluster 
entry requirement, and also improve the quality of the pivots as- 
sociated with the "trailing" clusters because of their higher 
variance. 



Second, the number of clusters differs for each run. The 
evaluation of this factor is based on the criterion concerned with 
"optimizing" the number and size of clusters. In order to develop 
specialty clusters, the initial clusters (Ci) should not be sur- 
feited with personnel (as in the Propulsion/Auxiliary area run at 
CP=20 corresponding to ST=l6) , nor should the "trailing" clusters 
(Ct) be too small (as in the Propulsion/Auxiliary area run at CP=5 
corresponding to ST=25)* One can obtain a good idea of the kinds 
of "trade-offs" required by the different cluster structures in 
simply fulfilling the criteria of cluster number and size. 
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TABLE 8 

Summary of Cluster Identification Tables 



Program Run 

OP ST £ lvot 

Range 

j> 64ths (P 1 -vP t ) 
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Cluster 
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Difference 
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Third, the number and percent of unclustered personnel (those 
with similarities to pivots <ST) also differs for each run. In 
the Hull/Repair area, for instance, the percent of personnel un- 
clustered runs from 6b% at ST=38 to 33$ at ST=32. Thus, with a 
threshold difference of only 6/64ths, the percent of unclustered 
personnel almost doubles in size. In selecting specialty clusters 
it is desirable to minimize the percent of unclustered personnel so 
that a major portion of the different task patterns sampled is in- 
cluded in the cluster analysis. Table 8 shows the effect of de- 
creasing the percent of unclustered personnel: namely, reducing 

the ST and, therefore, the degree of homogeneity in each cluster. 

As with the other criteria of cluster "optimality” some trade-off 
must be made. 

Fourth, the number of multiple memberships must also be con- 
sidered in selecting specialty clusters. Multiple memberships occur 
when an individual has a similarity >ST to more than one pivot, and 
thereby becomes a member of more than one cluster. In some cluster 
runs, the number of multiple cluster memberships can be quite high. 
This is undesirable because it results in overlapping task patterns 
among clusters and tends to dilute the homogeneity of the clusters 
in which the individual appears. Table 8 shows the percent of mul- 
tiple memberships for each run increasing as the ST decreases. For 
example, in the Propuls ion/Auxili ary area the percent of multiple 
membership runs from ^b% at ST=25 to 72$ at ST=l6. 

On the basis of most criteria of cluster selection, the higher 
thresholds seem to provide the "optimal" clusters. There is, how- 
ever, the problem of high numbers of unclustered personnel at those 
thresholds. Figure b shows the interrelationships of some structural 
features for different Propuls ion/Auxili ary computer runs. The four 
sets of cluster run characteristics charted in Figure H are linearly 
related to CP (positive) and ST (negative). 

As a result of the considerations noted above, an intensive 
examination of the program logic behind the pivot and cluster selec- 
tion techniques led to some refinements in the pivot theory as well 
as methods for obtaining optimum specialty clusters. 
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FIGURE k 



Relationship of Selected Structural 
Features of the Initial Clustering Technique 




KEY: 
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APPENDIX E 



Cluster Selection Listings 



Propulsion/Auxiliary Task Area 
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